Estimating Size of Search Engines in an Uncooperative Environment
نویسندگان
چکیده
The number of documents that are indexed by a search engine is referred to as the size of the search engine. The information about the size of each underlying search engine is essential for any metasearch engine to conduct search engine selection, result merging and a few other processes. Thus, effectively estimating the size of search engines is important for a metasearch engine that incorporates multiple autonomous search engines. In this paper, we propose an algorithm that achieves better accuracy compared to the other existing methods for estimating the size of search engines, without losing efficiency. Compared to the Sample-Resample approach, which is the best-known approach in literature, our technique also shows much better tolerance to unfavorable environments.
منابع مشابه
Resource Selection for Federated Search on the Web
A publicly available dataset for federated search reflecting a real web environment has long been absent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real sear...
متن کاملEstimating Collection Size in Distributed Search
Distributed search is an effective way to search information over thousands of information collections available on the web. As an important feature in distributed search, collection size plays a vital role in resource representation and selection. This paper proposes two novel algorithms to estimate collection size in uncooperative environments. Sample high frequent resample (SHFRS) algorithm ...
متن کاملEvaluation of a Recursive Weighting Scheme for Federated Web Search
The informative resources available on the Web are not always directly accessible and cannot therefore be crawled since access is permitted only through the adoption of appropriate services, e.g. specialized search engines. On the other hand, specialized search engines can help address the problem of heterogeneity of the informative resources due to the type of content, the structure or the med...
متن کاملEngine Classification Using Vibrations Measured by Laser Doppler Vibrometer on Background Surfaces
In our previous studies, vehicle surfaces’ vibrations caused by operating engines measured by Laser Doppler Vibrometer (LDV) have been effectively exploited in order to classify vehicles of different types, e.g., vans, 2-door sedans, 4-door sedans, pick-ups and buses, and different types of engines, such as Inline-four engines, V-6 engines, 1-axle diesel engines and 2-axle diesel engines. The r...
متن کاملبررسی وضعیت وب سایتهای دانشگاههای علوم پزشکی کشور بر اساس شاخصهای وب سنجی
Introduction: Webometrics ranking shows the amount of scientific and educational activities of universities and organizations annually. This study was an attempt to rank medical universities in Iran via three search engines. Methods: This applied- descriptive study used webometric methods to survey 43 websites of medical universities in Iran. The three indexes of size, visibility and rich f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004